Revisiting Fisher Kernels for Document Similarities
نویسندگان
چکیده
This paper presents a new metric to compute similarities between textual documents, based on the Fisher information kernel as proposed by T. Hofmann. By considering a new point-of-view on the embedding vector space and proposing a more appropriate way of handling the Fisher information matrix, we derive a new form of the kernel that yields significant improvements on an information retrieval task. We apply our approach to two different models: Naive Bayes and PLSI.
منابع مشابه
Fisher Kernels and Probabilistic Latent Semantic Models THÈSE NO 4647 ( 2010 ) ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE
Tasks that rely on semantic content of documents, notably Information Retrieval and Document Classification, can benefit from a good account of document context, i.e. the semantic association between documents. To this effect, the scheme of latent semantics blends individual words appearing throughout a document collection into latent topics, thus providing a way to handle documents that is les...
متن کاملUsing Fisher Kernels and Hidden Markov Models for the Identification of Famous Composers from their Sheet Music
We present a novel application of Fisher kernels to the problem of identifying famous composers from their sheet music. The characteristics of the composers writing style are obtained from note changes on a basic beat level, combined with the notes hidden harmony. We are able to extract this information by the application of a Hidden Markov Model to learn the underlying probabilistic structure ...
متن کاملInformation Diffusion Kernels
A new family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric, information diffusion kernels generalize the Gaussian kernel of Euclidean space, and provide a natural way of combining generative statistical modeling with non-parametric discr...
متن کاملDiffusion Kernels on Statistical Manifolds
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomia...
متن کاملDeriving TF-IDF as a Fisher Kernel
The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standard models such as the multinomial distribution. This paper investigates the DCM Fisher kernel, a function for comparing documents derived from the DCM. We show that the DCM Fisher kernel has components that are similar...
متن کامل